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1. Introduction 

The Turing machines commonly used and studied in computer science have sepa- 
rate tapes for input/output and for storage, so that we can conveniently study both 
storage as a dynamic resource and the more complex storage structures required for 
efficient implementation of practical algorithms [HS65]. Early researchers [MRF67] 
asked specifically whether two-head storage is more powerful if both heads are on 
the same one-dimensional storage tape than if they are on separate one-dimensional 
tapes, an issue of whether shared sequential storage is more powerful than separate 
sequential storage. Our result settles the longstanding conjecture that it is. 

In a broader context, there are a number of natural structural parameters for 
the storage tapes of a Turing machine. These include the number of tapes, the 
dimension of the tapes, and the number of heads on each tape. It is natural to 
conjecture that a deficiency in any such parameter is significant and cannot be 
fully compensated for by advantages in the others. For the most part, this has 
indeed turned out to be the case, although the proofs have been disproportionately 
difficult [Ra63, He66, Gr77, Aa74, PSS81, Pa82, DGPR84, Ma85, LV88, LLV92, 
MSST93, PSSN90]. 

The case of deficiency in the number of heads allowed on each tape has turned out 
to be the most delicate, because it involves a surprise: A larger number of single- 
head tapes can compensate for the absence of multihead tapes [MRF67, FMR72, 
LS81]. For example, four single-head tapes suffice for general simulation of a two- 
head tape unit, without any time loss at all [LS81]. The remaining question is just 
what, if anything, is the advantage of multihead tapes. 
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The simplest version of the question is whether a two-head tape is more power- 
ful than two single-head tapes. In the case of multidimensional "tapes", Paul has 
shown that it is [Pa84]. His proof involves using the two-head tape to write, and 
occasionally to retrieve parts of, algorithmically incompressible bit patterns. Be- 
cause the diameter of the pattern (and hence the retrieval times) can be kept much 
smaller than its volume, no fast simulator would ever have time to perform any 
significant revision or copying of its representation of the bit pattern. On ordinary 
one-dimensional tapes, however, retrievals take time that is not small compared to 
the volume of data, and we cannot so easily focus on a nearly static representation 
of the data. We need some more subtle way to rule out all (possibly very obscure) 
copying methods that a two-tape machine might employ to keep up with its mission 
of fast simulation. Our argument below does finally get a handle on this elusive 
"copying" issue, making use of a lemma formulated more than ten years ago with 
this goal already in mind [Vi84, Far- Out Lemma below]. 

Our specific result is that no Turing machine with just two single-head one- 
dimensional storage tapes can recognize the following language in real time:^ 

L = { x2x' I x € {0, 1}* and x' is a prefix of x }. 

With a two-head tape, a Turing machine can easily recognize L in real time. 

Our result incidentally gives us a tight bound on the number of single- head tapes 
needed to recognize the particular language L in real time, since three do suffice 
[MRF67, FMR72]. Thus L is another example of a language with "number-of-tapes 
complexity" 3, rather different from the one first given by Aanderaa [Aa74, PSS81]. 
(For the latter, even a two-head tape, even if enhanced by instantaneous head-to- 
head jumps and allowed to operate probabilistically, was not enough [PSSN90].) 

Historically, multihead tapes were introduced in Hartmanis and Stearns' seminal 
paper [HS65], which outlined a Zmear-timc^ simulation of an ft,-head tape, using 
some larger number of ordinary single-head tapes. Stofi [St70] later reduced the 
number of single-head tapes to just h. Noting the existence of an easy rea?-time 
simulation in the other direction, Becvaf [Be65] explicitly raised the question of real- 
time simulation of an /i-head tape using only single-head tapes. Meyer, Rosenberg, 
and Fischer devised the first such simulation [MRF67] ; and others later reduced the 
number of tapes [FMR72, Bc74, LS81], ultimately to just Ah- A. We are the first to 
show that this number cannot always be reduced to just h, although both the extra 
power of multihead tapes and the more-than-two-tape complexity of the particular 
language L have been longstanding conjectures [FMR72, LS81, Vi84, Pa84]. 

2. Tools 

Overlap. Part of our strategy will be to find within any computation a sufficiently 
long su6computation that is suflaciently well behaved for the rest of our analysis. 

^ On-line recognition requires a verdict for each input prefix before the next input symbol 
is read, and real-time recognition is on-hne recognition with some constant delay bound on the 
number of steps between the reading of successive input symbols. Note that even a single-ta.'pe 
Turing machine can recognize L on-line in cumulative lineaj: time; but this involves an unbounded 
(linear-time) delay to "rewind" after reading the symbol 2. In cumulative lineaj: time, in fact, 
general on-line simulation of a two-head one-dimensional tape is possible using just two single-head 
tapes [St70] ; so real time is a stronger notion of "without time loss" . (There is an analogous linear- 
time simulation for two-dimensional tapes [ST89], but the question is open for higher dimensions.) 
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The behavior we seek involves Hmitations on repeated access to storage locations, 
which we call "overlap" [Aa74, PSSN90]. 

Our overlap lemma is purely combinatorial, and does not depend at all on the 
nature of our computations or the "storage locations" corresponding to their steps. 
Nor does it depend on the computational significance of the steps designated as 
"distinguished". The use of computational terminology would only obscure the 
lemma's formulation and proof, so we avoid it. 

An overlap event in a sequence S = £i, . . . ,£t (of "storage locations" , in our ap- 
plication) is a pair of indices with I < i < j < T and £i = ij ^ ■ • ■ , ^j-i} 
("visit and soonest revisit"). If uJt{S) is the number of such overlap events "strad- 
dling" t (i.e., with i < t but j ^ t), then the sequence's internal overlap, co{S), is 
max{ojt{S) I 1 < t < T}. The relative internal overlap is oj{S)/T. 

Here is an example: In the sequence 

S = cow, pig, horse, pig, sheep, horse, pig, 

the overlap events are (2, 4), (4, 7), and (3, 6). For t from 1 up to 6, the respective 
values oiu)t{S) are 0, 1, 2, 2, 2, and 1; so u^{S) is 2, and the relative internal overlap 
is 2/7. 

(In our setting below, we apply these definitions to the sequence of storage 

locations shifted to on the successive steps of a computation or subcomputation. 
Without loss of generality, we assume that a multihead or multitape machine shifts 
exactly one head on each step.) 

The lemma we now formulate guarantees the existence of a contiguous subse- 
quence that has "small" relative internal overlap (quantified using e), but that is 
itself still "long" (quantified using e'). The lemma additionally guarantees that the 
subsequence can include a quite fair share of a set of "distinguished positions" of 
our choice in the original sequence. 

(The "designated positions" in our setting will be the items in the sequence that 
correspond to a large "matching" -a notion we define later, especially motivated 
by computations involving two heads.) 

Overlap Lemma. Consider any 5 < 1 and any e > 0. Every sequence S (of 
length T, say) with "distinguished-position" density at least S has a long contiguous 
subsequence, of length at least e'T for some constant e' > that depends only on S 
and e, with distinguished-position density still at least 6/2, and with relative internal 
overlap less than s. 

Proof. Without loss of generality, assume T is a power of 2 that is large in terms 
of 6 and e. (If T is not a power of 2, then we can discard an appropriate prefix and 
suffix of combined length less than half the total, to obtain such a sequence with 
distinguished-position density still at least S.) We c;onsidcr only the sequence's two 
halves, four quarters, eight eighths, etc. Of these, we seek many with sufficient 
distinguished-position density (at least S/2) and with internal overlap accounted 
for by distinct overlap events, planning then to use the fac;t that each item in S can 
serve as the second component of at most one overlap event. 

Within each candidate subsequence S', we can select a particular straddle point t 
for which ^-"(5") = ujtiS'), and then we can designate the uj{S') overlap events 
within S' that straddle position t as the ones we consider counting. The designated 
overlap events in S' can be shared by another interval only if that interval includes 
the corresponding selected straddle point t. 
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We consider the caiidiclate sequences in order of decreasing length (i.e., halves, 
then quarters, then eighths, etc.). At each partitioning level, at least fraction 5/2 of 
the subsequences must have distinguished-position density at least (5/2. (Otherwise, 
we cannot possibly have the guaranteed total 5T distinguished positions in the 
subsequences on that level, since {5/2) • 1 + (1 — 5/2) ■ 5/2 < 5.) Among these, we 
can count distinct overlap from 

[((5/2)2] = [(5] > (5/2 - 1/2 halves, 

[((5/2)4] - [((5/2)2] = [2(5] - [(5] > 5 - 1/2 quarters, 

\ {5/2)8] - I {5/2)4] = [45] - [2(5] > 25 -1/2 eighths, 

[(5/2)16] - [(5/2)8] = [85] - [45] > 45-1/2 sixteenths, 

etc. 

Unless we find one of these sequences that has relative internal overlap less than e, 
this accounts, at the ith level, for at least 

(2'-25 - t){eT/2') = eST/4: - eT/2'+^ 

distinct overlap events, and hence for more than T distinct overlap events after 
[(4 + 2s)/{e5)] levels. This is impossible, so we must find the desired low-overlap 
sequence at one of these levels. □ 

Kolmogorov Complexity. A key to the tractability of our arguments (and most 
of the recent ones we have cited [Pa82, Pa84, PSS81, DGPR84, Ma85, LV88, LLV92, 
PSSN90, Vi84]) is the use of "incompressible data". Input strings that involve such 
data tend to be the hardest and least subject to special handling. 

We define incompressibility in terms of Kolmogorov's robust notion of descrip- 
tional complexity [Ko65]. Informally, the Kolmogorov complexity K{x) of a binary 
string X is the length of the shortest binary program (for a fixed reference universal 
machine) that prints x as its only output and then halts. A string x is incom- 
pressible if K{x) is at least |a;|, the approximate length of a program that simply 
includes all of x literally. Similarly, a string x is ^^nearly" incompressible if K{x) is 
"almost as large as" |a:|. 

The appropriate standard for "almost as large" above can depend on the context, 
a typical choice being "i4r(x) > \x\ — 0(log|a;|)". The latter implicitly involves 
some constant, however, the careful choice of which might be an additional source 
of confusion in our many-parameter context. A less typical but more absolute 
standard such as "if (a;) > |a;| — -\/|^" completely avoids the introduction of yet 
another constant. 

Similarly, the conditional Kolmogorov complexity of x with respect to y, denoted 
by K{x\y), is the length of the shortest program that, with extra information y, 
prints x. And a string x is incompressible or nearly incompressible relative to y 
if K{x\y) is large in the appropriate sense. If, at the opposite extreme, K{x\y) is 
so small that |x| — K{x\y) is "almost as large as" then we say that y codes x 
[CTPR85]. 

There arc a few well-known facts about those notions that wo will use freely, 
sometimes only implicitly. Proofs and elaboration, when they are not sufficiently 
obvious, can be found in the literature [especially LV93]. The simplest is that, 
both absolutely and relative to any fixed string y, there are incompressible strings 
of every length, and that most strings are nearly incompressible, by any standard. 
Another easy one is that significantly long subwords of an incompressible string 
are themselves nearly incompressible, even relative to the rest of the string. More 



TWO HEADS ARE BETTER THAN TWO TAPES 



5 



striking is Kolmogorov and Levin's "symmetry of information" [ZL70]: K{x) — 
K{x\y) is very nearly equal to K{y) — K{y\x) (up to an additive term that is 
logarithmic in the Kolmogorov complexity of the binary encoding of the pair (x, y)); 
i.e., y is always approximately as helpful in describing x as vice versa! (Admittedly, 
the word "helpful" can be misleading here the result says nothing at all about the 
relative computational complexity of generating the two strings from each other.) 
All these facts can be relativized or further relativized; for example, symmetry of 
information also holds in the presence of help from any fixed string z: 

K{x I z) — K{x\y I z) « K{y | z) — K{y\x | z). 

3. Strategy 

For the sake of argument, suppose some two-tape Turing machine M does recog- 
nize { x2x' I X G {0, 1}* and x' is a prefix of x } in real time. Once a binary string 
X € {0, 1}* has been read by M, the contents of M's tapes tend to serve as a very 
redundant representation of prefixes ofx, because M has to be prepared to retrieve 
them at any time. (Our problem and this observation were motivation for Chung, 
Tarjan, Paul, and Reischuk's investigation of "robust codings of strings by pairs 
of strings" [CTPR85].) One way around this is for M to keep one or the other 
of its tapes' heads stationed at some stored record of a long prefix of .x, as "in- 
surance" . The early real-time multihead simulations of buffers [MRF67, FMR72, 
Be74] do follow this strategy, but we show that a machine with only two tapes 
will not be able to afford always to use one in this way for insurance: There will 
have to be a significant subcomputation in which the heads on both tapes "keep 
moving", even "essentially monotonically" — essentially as they would for straight- 
forward "copying". Under these circumstances, in fact, we will be able to use part 
of the computation itself, rather than the combination of the two tapes' contents, 
as the very redundant representation, to contradict the following lemma, which we 
prove later. 

Anti-Holography Lemma. Consider any constant C , and consider any binary 
string x that is long in terms of C, and that is nearly incompressible."^ Suppose 
y = yiy2---yk (each yi a binary string) is a "representation" with the following 
properties: 

1. \y\<C\x\; 

2. For each I < k, x's prefix of length i\x\/k is coded by yi+i . . .yi+e for each 

i<k-e. 

Then k is bounded by som,e constant that depends only on C . 

For (the binary representation of) a T-step subcomputation by M to serve as a 
representation y that contradicts this lemma, we need the following: 

1. A nearly incompressible input prefix x of length at least \y\/C = 0{T/C) was 
read before the subcomputation. 

2. There is a parse of the subcomputation into a large number k of pieces so 
that each prefix of x of length i\x\/k is coded in every contiguous sequence 
of £ pieces. 

3. k is (too) large in terms of C. 

^We need K{x) > 5\x\, for some fraction S that is determined by C; so certainly K{x) > 
\x\ — ^\x\ will be enough if x is long. 
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We accomplish these things by finding a subcomputation that has a spatially mono- 
tonic "matching" that is both long and so well separated spatially that needed 
information on tape contents cannot be spread over many pieces of the subcompu- 
tation. 

The first step is to define and find "a large matching" , and the second is to refine 
it in a suitable way. In a two-tape or two-head computation or subcomputation, 
a monotonic sequence of time instants is a matching if neither head scans the 
same tape square at more than one of the time instants. (So there is actually a 
separate one-to-one "matching" for each head, between the time instants and the 
tape squares scanned by that head at those times.) We prove the following lemma 
later on. 

Large-Matching Lemma. // a two-tape Turing machine recognizes 

{ x2x' \ X G {0, 1}* and x' is a prefix of x} 

in real time, then its computation on an incompressible binary input of length n 
includes a matching of length f2{n). (The implicit constant does depend on the 

machine.) 

(Note that this lemma does not hold if the two heads can be on the same tape.) 

In a two-tape or two-head computation or subcomputation, a matching is (spa- 
tially) monotonic if, for each of the two heads, the spatial order of the corresponding 
sequence of tape squares being scanned at the specified time instants is strictly left- 
to-right or strictly right-to- left. The minimum separation of a monotonic matching 
is the least distance between successive tape squares in either corresponding se- 
quence of tape squares. 

Monotonization Lemma. Suppose s > is small in terms of S > 0. If a two- 
tape (sub) computation of length T has a matching of length at least 6T and internal 
overlap less than sT, then the computation has a monotonic submatching of length 

n{S/e) and minimum separation J7(eT). (The implicit constants here really are 
constant, not depending even on the machine; for use below, let c denote the smaller 
of them.) 

Proof. Without loss of generality, assume T is large in terms of S and e. Parse 
the computation into about 5/{2e) subcomputations, each including a matching 
of length at least 2eT. Each subcomputation involves a contiguous set of at least 
2eT distinct tape squares on each tape. The sets from successive subcomputations 
touch or intersect, but the overlap bound limits their intersection to less than eT 
tape squares. If we omit every second subcomputation's set, therefore, we get a 
spatially monotonic sequence of about 6/ (is) nonintersecting sets on each tape. If 
we further omit every second remaining set, then we get a monotonic sequence of 
about (5/(8e) sets on each tape, with successive sets separated by at least 2sT tape 
squares. To get the desired submatching, simply include one matching-time instant 
from each of the 6/ {8s) remaining subcomputations. □ 

4. Careful Argument 

Now let us put together the whole argument, taking care to introduce the "con- 
stants" M (and d), 6, s, and e' in an appropriate order, all before the input length n 
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and the particular input string .tq on which we focus. Each of these values is allowed 
to depend on earlier ones, but not on later ones. 

For the sake of argument, suppose some two-tape Turing machine M does rec- 
ognize the language {x2x' \ x & {0, 1}* and x' is a prefix of x} in real time, say 
with delay bound d. Citing the Large-Matching Lemma, take 6 > small enough 
so that M's computation on any incompressible input string x € {0, 1}* includes a 
matching of length at least 6\x\. Let e > be small in terms of d, 5, and M; and let 
e' be small in terms of d, 5, and e. Let n be large in terms of all these constants, 
and let xq be any incompressible string of n bits. 

Split the computation by M on input xq into an initial subcomputation and a 
final subcomputation, each including a matching of length [(5ri/2j. The number of 
steps in each of these subcomputations will lie between [(5ri/2j and dn. Therefore, 
the initial one will involve a prefix of a;o of length at least {l/d){Sn/2) = n6/{2d), 
and the final one will have "match density" at least {dn/2)/{dn) = 5/ {2d). 

Applying the Overlap Lemma to the final subcomputation above, we obtain a 
subcomputation of some length T > s'n, with match density at least S/{4:d) and 
relative internal overlap less than e, provided s' was chosen small enough in terms 
of rf, (5, and s. Then applying the Monotonization Lemma, we obtain within this 
subcomputation a monotonic submatching of minimum separation at least ceT, and 
of length 2fc+l, where 2k +1 is cither [c((5/(4(i))/e] or \c(5 / (4:d))/e~\ —1 (whichever 
is odd). If £ was chosen small, then k will be large. Note that he is approximately 
equal to a constant c6/{8d) that depends only on M. 

To obtain the desired contradiction to the Anti-Holography Lemma, take y to 
be a complete record of the T-step subcomputation obtained above, including the 
symbols scanned and written by each head on each step. To obtain yi, y2, ... , 
yk, split this record at every second one of the time instants corresponding to the 
matching of length 2k + 1, starting with the third and ending with the third-to- 
last. Take x to be xo's prefix of length kcsT/{2d). Since 6n/{2d) exceeds this 
length (assuming we chose our constants appropriately), all of x was already read 
during the initial subcomputation above, and hence before the beginning of the 
subcomputation described by y. Note that, for some constant D that depends only 
on M, 

, , 2dD, , 16^2 £) 

and that k is large (in fact, too large for the Anti-Holography Lemma) in terms of 

the constant C = IGd"^ D / {c^ S) , assuming we chose e small enough. 

To see that x's prefix of length i\x\/k is coded by j/j+i . . . yi+e (for each appropri- 
ate i and i), suppose we interrupt M with "the command to begin retrieval" (i.e., 
with the symbol 2) at the (2i -I- £ -I- l)st of the time instants corresponding to the 
matching of length 2fc-|-l. Since M must be able to check the prefix of length £\x\/k 
by reading only the information within distance d£\x\/k = £ceT/2 of its heads, that 
prefix must be coded by that information. Since this distance in each direction is 
just £/2 times the minimum separation of the matching, and since the matching is 
monotonic, the same information is available within the subcomputation record y, 
between the matching's time instants 2i + £+l — \£/2'] and 2i + £+l+\£/2]. Since 
Vi+i ■ ■ ■ Ui+e runs from the matching's time instant 2z-|-I < 2i + £ + 1 — \i/2'\ to 
the matching's time instant 2i -|- 2^ -|- 1 > 2i -|- ^ -|- 1 -|- [£/2] , it too codes the desired 
prefix. 
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5. Proof of Anti-Holography Lemma 

Without loss of generality, assume k is equal to 2'' for some integer exponent e.^ 
Then the target constant can be 2^*^"^. Again without loss of generality, assume 
k is at most this target constant times two^ Finally, without loss of generality, 
assume that \x\ = n is divisible by fc, with x = xi . . .Xk and \xi \ = n/k for every i.^ 

To obtain short descriptions of y, we abbreviate many of its subwords in terms 
of just a few prefixes of x, using the symmetry of information. For each j < e, and 
for j = e — 1 in particular, this will yield 

K{y\xi ...X2,)< \y\ - (1 + j72)n + 0(logn). 

Unless k is smaller than 2^*-^"^, e — 1 will be so large that this will imply that 
Xi . . . X2-—1 codes y. Since y in turn codes all of a; = Xi . . . a;2= this will mean 
that the first half of x codes the whole string, contradicting the incompressibility 
assumption for x. 

By induction on j (j = 0, 1, . . . , e), we actually prove "more local" bounds that 
imply the ones above: For each appropriate i (i = 0, 1, ... , k — 2^), 

K{yi+i ■■■yi+2A3:i...X2i) < \yi+i ■ ••^^+2^1 - 2^(1 + j72)n/fc + O(logn). 

Both the base case and the induction step are applications of an intuitively clear 
corollary, the Further- Abbreviation Lemma below, of the symmetry of information. 
For the base case, we apply the lemma with y' equal to yi+i, x' equal to the null 
string, and x" equal to xi, to get the desired bound on K{y'\x"): 

K{y'\x") < K{y') - K{x") + O(logn) 
< |y'| -nlk + 0{\ogn). 

For the induction step, we let y" = y^+i . . . yi+23 and y'" = yi+2j+i ■ ■ ■ yi+2i+^ , and 
apply the lemma with y' equal to y"y"', x' equal to xi . . ■X23, and x" equal to 
X23+1 ■ ■ ■ X23+1 , to get the desired bound on K{y'\x'x"): 

K{y'\x'x") < K{y'\x') - K{x") + O(logn) 

< K{y"\x') + Kiy"'\x') - K{x") + 0(\ogn) 

< \y"\ + \y'"\ - 2 • 2^(1 + j/2)n/k - 2%//c + 0(\ogn) 
= \y"\ + \y"'\ - 2^+1(1 + (j + l)/2)n/k + O(logn). □ 

Further-Abbreviation Lemma. Assume y', x', and x" are strings of length 
0{n), with 

K{x"\y')=Oi\ogn) 

and 

K{x"\x') = K{x") - O(logn). 

^If it is not, then just reduce it until it is. 

^Otherwise, pair up yi's to reduce k by factors of 2 until it is. 

^If x's length is not divisible by k, then just discard at most its last 2^^"^ bits, until its length 
is divisible by k. 
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(I.e., y' codes x" , which is nearly incompressible relative to x' .) Then 
K{y'\x'x") < K{y'\x') - K{x") + 0{\ogn). 

Proof. Let d{u\v) denote a shortest description of u in terms oft;, so that |d(u|t;)| = 
K(u\v). Then 

K{y'\x'x") < K{d{y'\x')\x'x") + 0{logn) 

< K{d{y'\x')\x" I x') + O{logn) 

< K{d{y'\x') I x') ~ K{x" I x')^K{x"\d{y'\x') \ x') + 0{\ogn) 

< K{y'\x') - K(x"\x') + K{x"\y') + O(logn) 

< K{y'\x') - K{x") + C(logn).n 

6. Proof of Large-Matching Lemma 

Our proof of the Large-Matching Lemma is based on an earher theorem of 
Vitanyi: 

Far-Out Lemma [Vi84]^. If a two-tape Turing machine recognizes 

{ x2x' I a; e {0, 1}* and x' is a prefix of x} 

in real time, then its "worst-case closest head position"'^ on incompressible inputs 
X e {0,1}" is Q{n). 

In other words, incompressible binary data is guaranteed at some point to drive 

both heads of such a machine simultaneously far from their original positions. By 
the continuity of sequential access, of course, this means that the heads actually 
spend long intervals of time simultaneously far from their original positions; and 
this is the fact that we exploit. 

We actually show that even any two-head Turing machine (with both heads on 
the same one-dimensional tape) that recognizes our language and that satisfies the 
conclusion of the Far-Out Lemma also satisfies the desired conclusion of the Large- 
Matching Lemma. (Of course the obvious two-head machine, that does recognize 
our language in real time, does not satisfy the conclusion of either lemma.) This 
simplifies the exposition, since we have only one tape to talk about. Note that the 
"matching" notion does make sense even when both heads are on the same tape. 

As earlier, let us take explicit care to introduce our "constants" in an appropriate 
order. Consider any two-head Turing machine M alleged to recognize 

{ x2x' \ X G {0, 1}* and x' is a prefix of a; } 

in real time, say with delay bound d, and that satisfies the conclusion of the Far-Out 
Lemma. Let c be small enough to serve as the implicit constant in that conclusion. 
Let £ be small in terms of M and c; let S be small in terms of M, c, and e; let 

®For a sketch of the proof, see the appendix below. 

''If Pi{t) denotes the net displacement of head i at time t, then the "worst-case closest head 
position" is maxt min j p j (t) . 
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n be large in terms of M, c, e, and S; and let x be an incompressible string of n 
bits. Exploiting the conclusion of the Far-Out Lemma, parse x into three pieces, 
X = uvw, such that uv leaves both heads at least cn tape squares from where they 
started and the length of u is [cn/{3d)\ = 0{n). 

Consider M's computation on uv2u. The first u must be read before either head 
gets as far as even cn/3 tape squares from where it started, but the second u must 
be read while neither head gets closer than 2cn/3 tape squares to where it started. 
During its subcomputation on v, therefore, it seems that M must somehow "copy" 
its representation of u across the intervening cn/3 tape squares. We show that this 
process has to involve a matching larger than 5n. 

For the sake of argument, suppose there is not a matching larger than 5n. Then 
there must be a maximal matching of size only m < 5n. We will select some 
correspondingly small "interface" through which a description of u must pass. That 
interface will involve some rarely crossed boundary at distance between cn/3 and 
2cn/3 from the heads' starting position, and some other rarely crossed boundaries 
that tightly isolate the 2m tape squares involved in the matching. Since there are 
2cn/3 — cn/3 candidates for the former, we can select one that is crossed only a 
constant number (bounded in terms of d and c) of times. We will refer to the tape 
squares on the respective sides of this selected central boundary as close and Jar. 
By the following purely combinatorial lemma, we can tightly isolate the matched 
tape squares with at most Am additional boundaries, each of which is crossed only 
a constant number (bounded in terms of d, c, and our "tightness criterion" s) of 
times. 

Tight-Isolation Lemma. Consider a finite sequence S of nonnegative numbers, 
the first and last of which are 0. Let some of the separating "commas" be specially 

designated - call them "semicolons" . For each threshold £ > 0, let Si be the sub- 
sequence consisting of the items^ that are reachable from the semicolons via items 
that exceed £ (and that themselves exceed £). Then, for each £ > 0, there is some £ 
such that £\S£\ < S*, where ^ S denotes the sum of the entire sequence S and 
£ is bounded by some constant that depends only on e. 

Proof Let T=J2S, and let k = 2\2/e]. Since 2T/k < eT/2 < eT, let us aim for 
^\Se\ < 2T/k. If no ^ in { fc' I < i < fc } were to work, then we would have 

2T/fc< fe'l^l <T 

for every i. But this would lead to the contradiction 

k 

T>J2k\\SkA-\Sk^+-\) 

1=0 
k 

>Y,{'^T/k-T/k) 

i=0 

= {k + l)T/k. □ 

In our application, the numbers are the lengths of the crossing sequences associ- 
ated with the boundaries between tape squares, their sum is at most dn, and the 



*Note that the number of such items can be small even if the number of semicolons is large. 
For £ large enough, in fact, 15^1 will be 0. 
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semicolons are the matched tape squares. We obtain our desired "isolation neigh- 
borhoods" from the at-most-2m contiguous neighborhoods that comprise S^^ by 
adding one item at each end of each neighborhood. (This might cause neighbor- 
hoods to combine.) This adds at most 4m items to 5*^ and results in nonempty 
isolation neighborhoods whose boundary items are at most i. 

Actually, the picture is clearer if we select our central boundary after we select 
the isolation neighborhoods. Assuming e and 5 are chosen appropriately small, this 
lets us select a boundary not included in any of the isolation neighborhoods. (There 
are at most Am +\Si\ < 2Sn + edn < cn/6 boundaries (half the original number of 
candidates) to avoid.) 

Finally, we use our suggested interface to give a description of u in terms of v that 
is too short — say shorter than |m|/2 « cn/{6d). (We could substitute a description 
this short for w in x to contradict the incompressiblity of x.) We claim we can 
reconstruct u from M, v, the length of u, and the following information about the 
subcomputation of M while reading the v part of input uv: 

1. The sequence of all 0{m) selected boundary locations. 

2. The sequence of all 0(m) crossings of these selected boundaries, and their 
times (implicitly or explicitly including the corresponding input positions). 

3. The following information for each close-to-far crossing, and for the end of 
the subcomputation: 

• M's control state and head positions. 

• The full content of every isolation neighborhood. 

4. The following information for each crossing out of an isolation neighborhood: 

• The full content of that isolation neighborhood. 

• The full content of the isolation neighborhood in which the other head 
remains^" — provided that there has been a new crossing into that neigh- 
borhood since the previous time such information was given for it. 

To determine u, it suffices to reconstruct enough of M's configuration after its 
computation on input uv so that we can check which additional input string 2u' of 
length 1 -|- |u| leads to acceptance. The far tape contents suffice for this. 

Our reconstruction strategy is mostly to simulate M step-by-step, starting with 
the first close-to-far crossing. Toward this end, we strive to maintain the contents 
of any currently scanned close isolation neighborhood and of the entire far side. We 
temporarily suspend step-by-step simulation whenever a head shifts onto a close 
tape square not in any isolation neighborhood, and we aim to resume suspended 
step-by-step simulation whenever a head shifts onto a far tape square not in any 
isolation neighborhood. Because our matching is maximal, such a far tape square 
is not scanned at the time of suspension, and hence also not at any time before 
the desired resumption. It follows that the information for the needed updates is 
indeed available, so that resumption is indeed possible. Similarly, any necessary 
updates are possible if the step-by-step simulation happens to be suspended when 
the subcomputation ends. 

It remains only to show that \u\/2 bits suffice for our description of u in terms 
of V. For each of the sequences in (1) and (2), the trick is to give only the first 
number explicitly, and then to give the sequence of successive differences. The 

^To include all the semicolons, some of these "contiguous neighborhoods" might have to be 
the empty neighborhoods of the semicolons. 

^"The other head must remain in some isolation neighborhood — otherwise, the matching could 
be enlarged. 
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length of this encoding is 0{m\og{n/rn)) = 0{n\og{l/5)/{l/5)), which can be 
Umited to a small fraction of |u|/2 « cn/(6d) by choosing 6 small enough. For (4), 
note that that the contents of each isolation neighborhood is given at most once for 
each of the £ crossings into and out of the neighborhood. For (3) and (4), therefore, 
straightforward encoding requires only 0{\ogn+£{m+\Sc\)) = O {log n+£5n+edn) 
bits, where the implicit constant is bounded in terms of d and c. This can be limited 
to another small fraction of \u\/2 by choosing e small enough, S small enough, and 
n large enough. For the remaining information, M, \u\, and a description of this 
whole discussion, we need only ©(logn) bits, which can be limited to a final small 
fraction of \u\/2 by choosing n large enough. □ 

7. Further Discussion and Remaining Questions 

In retrospect, our contribution has been a constraint on how a Turing machine 
with only two storage heads can recognize L in real time. Even if the two heads 
arc on the same onc-dimcnsional tape, such a Turing machine cannot recognize L 
in real time unless it violates the conclusion of (the first sublemma of) Vitanyi's 
Far-Out Lemma (see Appendix below). Only in the latter do we ever really exploit 
an assumption that the two heads arc on separate tapes. 

Our result rules out general real-time simulation of a two-head tape unit using 
only a pair of single-head tapes. It remains to be investigated whether the result 
extends to some notion of probabilistic real-time simulation [cf., PSSN90]. Another 
extension might rule out simulation using three single-head tapes, yielding a tight 
result; but this would require a more difficult witness language. Perhaps allowing 
the "back" head of the defining two-head machine also to move and store random 
data, but much more slowly than the "front" head, would let us combine our argu- 
ments with those of Aanderaa [Aa74, PSS81, Pa82]. A slightly weaker possibility 
might be to show that two single-head tapes and a pushdown store do not suffice, 
and a slightly stronger one might be to show that even three single-head tapes and 
a pushdown store do not suffice. 

It might be even more difficult to rule out general real-time simulation of a 
two-head one-dimensional tape unit using two or three higher- dimensional single- 
head tapes. Our particular language L can be recognized in real time by a Turing 
machine with just two such two-dimensional tapes the idea is to strive to main- 
tain the n bits of data within an 0{^/n) radius on both tapes, along with 0{^/n) 
strategically placed copies of the first 0{^/n) bits, to serve as insurance alternatives 
at the same time that the array of their left ends provides a convenient area for 
temporary collection of data and for copying data between the tapes. 

The implications for real-time simulation of one-dimensional tape units with 
more than two heads remain to be investigated. For example, how docs a three- 
head tape compare with three single-head tapes or with one single-head tape and 
one two-head tape? (Paul's results [Pa84] do answer such questions for tapes of 
higher dimension.) How tight is the known bound of 4/i — 4 single-head tapes for 
real-time simulation of one h-he&d (one-dimensional) tape [LS81]? Perhaps the 
many-heads setting is the right one for a first proof that even an extra head is 
not enough to compensate for the loss of sharing; e.g., can a 1000- head tape be 
simulated in real time by 1001 single-head tapes, or by 1000 single-head tapes and 
a pushdown store? 

Finally, does any of this lead to more general insight into the heads or tapes 
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requirements for arbitrary computational tasks? I.e.. when asked about some com- 
putational task, can we tightly estimate the structure of the sequential storage that 
suffices for the task? 

Appendix: A proof sketch for Vitanyi's Far-Out Lemma 

Suppose two-tape Turing machine M recognizes the language in real time. With- 
out loss of generality, assume M's storage tape is only semi-infinite, and assume 
M writes only O's and I's. Let d be the delay of M. 

Our ultimate goal is to show that both heads simultaneously range linearly far 
when the input is incompressible, but first we show that each one separately does 
so even when the input is just nearly incompressible. (The subsequent application 
is to significantly long prefixes of input strings that are not compressible at all.) It 
is only this part of the proof that requires the hypothesis that the two heads are 
on separate tapes. This part is based on the "bottleneck" argument that Valiev 
[Va70] (and, independently, Meyer [Me71]) used to show that no smg/e-tape Turing 
machine can accept the simpler language { x2x | x G {0, 1}* } in real time. 

Suppose e is small in terms of M and d, n is large in terms of all of the above, 
and X is of length n and nearly incompressible {K(x) >n — \/n)- We want to show 
that each head ranges farther than en. 

Suppose the head on one of the tapes, say the first, does not range farther than 
en. Then the head on the second tape must certainly range farther than, say, n/3. 
(Otherwise, the total state after storage of a; is a too-short description of x.) Let 
uvw be the parse of x with uv the shortest prefix of x that leaves M's second head 
at least n/3 tape squares out, and with \u\ = n/(9rf), so that that same head gets 
no farther than n/9 tape squares out during input of u. On that head's tape, there 
must be a "bottleneck" boundary between n/9 and 2n/9 tape squares out that gets 
crossed at most 9d times. Since all of u gets read when the second head is to the 
left of this bottleneck, it is possible to describe x = uvw in terms of vw and the 
bottleneck's "crossing sequence" , which should include, for each crossing, the step 
number and the "crossing state" , which in turn should include the complete but 
relatively small contents of the first storage tape at the time of the crossing. The 
following information suffices: 

1. vw, 

2. a description of this discussion, 

3. a description of M, 

4. the value of n, 

5. the location of the bottleneck, 

6. the crossing sequence at the bottleneck. 

If we provide vw as a literal suffix, then we can limit the length of this description 
to little more than n — \u\ bits, contradicting the near incompressibility of x. To 
recover u, we can use the information to determine enough of M's instantaneous 
description after reading uv (omitting from the i. d. only what is to the left of 
the bottleneck on the second tape) to then try each input continuation 2m' with 
= n/(9d). 

Finally, we return to our ultimate goal. Here is the idea: If the heads do not both 
go far out together, then they must take turns, so that some region gets crossed 
many times; abbreviate the symbols read while a head is in that region. 

Suppose e is small in terms of M and d (as above), £2 is small in terms of 
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the preceding parameters (in particular, £2 <C £), £1 is small in terms of the now 
preceding parameters (in particular, ei -C £2), is large in terms of all of the above, 
and X is of length n and incompressible. We want to show that both heads range 
farther than £in, simultaneously. 

Suppose, to the contrary, that there is always at least one head within £in tape 
squares of the origin. We count the crossings of the region from £in to £2/1: It 
follows from our assumptions that a left-to-right crossing must occur between in- 
put symbol number {d / ey {e^n / e) and input symbol number {d/ey~^^{e2n/e), for 
every i. (We use the fact that these input prefixes are themselves nearly incom- 
pressible.) By input symbol number n, therefore, the number of complete crossings 
(either direction) is at least r = 21og<j/£(£/£2) (which is large because £2 is so 
small). 

There is a complication, however: There might also be partial crossings, involving 
fewer input symbols but additional overhead in the description wc plan to give. To 
control this problem, we shrink the region slightly, replacing £1 and £2 with e'^ 
and £2 from the first and last quarters, respectively, of the range [£i, £2], chosen so 
that each of the boundaries e[n and e'2n is crossed at most R = 8d/s2 times. This 
is possible, since R{e2 — £i)n/4 exceeds dn. 

Finally, then, we formulate a description of the incompressible input that differs 
from the completely literal one as follows: We eliminate the input read while a 
head is in the range between e[n and e'2n, for a savings of at least r{e'2 — e[)n/d > 
r(£2 — e\)n/{2d) bits. We add descriptions of the crossing sequences at these two 
boundaries, including times, states, and the tape contents out to boundary e\n, 
and also the full final contents of the tape squares between the two boundaries, for 
a total cost of 

0{{e'2-e[)n + R{\ogn + sin)) = 0{{e2-ei)n + M{\ogn + ein)/e2) =0((£2-£i)n) 
bits, which can be kept significantly smaller than the savings. □ 
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Abstract. We show that a Turing machine with two single-head one-dimen- 
sional tapes cannot recognize the set 

{ x2x' I X G {0, 1}* and x' is a prefix of a; } 

in real time, although it can do so with three tapes, two two-dimensional tapes, 
or one two-head tape, or in linear time with just one tape. In particular, this 
settles the longstanding conjecture that a two-head Turing machine can recog- 
nize more languages in real time if its heads are on the same one-dimensional 
tape than if they are on separate one-dimensional tapes. 
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